130 research outputs found

    Feature Selection in k-Median Clustering

    Get PDF
    An e ective method for selecting features in clustering unlabeled data is proposed based on changing the objective function of the standard k-median clustering algorithm. The change consists of perturbing the objective function by a term that drives the medians of each of the k clusters toward the (shifted) global median of zero for the entire dataset. As the perturbation parameter is increased, more and more features are driven automatically toward the global zero median and are eliminated from the problem until one last feature remains. An error curve for unlabeled data clustering as a function of the number of features used gives reducedfeature clustering error relative to the \gold standard" of the full-feature clustering. This clustering error curve parallels a classi cation error curve based on real data labels. This justi es the utility of the former error curve for unlabeled data as a means of choosing an appropriate number of reduced features in order to achieve a correctness comparable to that obtained by the full set of original features. For example, on the 3-class Wine dataset, clustering with 4 selected input space features is comparable to within 4% to clustering using the original 13 features of the problem

    Data Mining via Support Vector Machines

    Get PDF
    Support vector machines (SVMs) have played a key role in broad classes of problems arising in various elds. Much more recently, SVMs have become the tool of choice for problems arising in data classi - cation and mining. This paper emphasizes some recent developments that the author and his colleagues have contributed to such as: gen- eralized SVMs (a very general mathematical programming framework for SVMs), smooth SVMs (a smooth nonlinear equation representation of SVMs solvable by a fast Newton method), Lagrangian SVMs (an unconstrained Lagrangian representation of SVMs leading to an ex- tremely simple iterative scheme capable of solving classi cation prob- lems with millions of points) and reduced SVMs (a rectangular kernel classi er that utilizes as little as 1% of the data)

    Set Containment Characterization

    Get PDF
    Characterization of the containment of a polyhedral set in a closed halfspace, a key factor in generating knowledge-based support vector machine classi ers [7], is extended to the following: (i) Containment of one polyhedral set in another. (ii) Containment of a polyhedral set in a reverse-convex set de ned by convex quadratic constraints. (iii) Containment of a general closed convex set, de ned by convex constraints, in a reverse-convex set de ned by convex nonlinear constraints. The rst two characterizations can be determined in polynomial time by solving m linear programs for (i) and m convex quadratic programs for (ii), where m is the number of constraints de ning the containing set. In (iii), m convex programs need to be solved in order to verify the characterization, where again m is the number of constraints de ning the containing set. All polyhedral sets, like the knowledge sets of support vector machine classi ers, are characterized by the intersection of a nite number of closed halfspaces

    Privacy-Preserving Horizontally Partitioned Linear Programs

    Get PDF
    We propose a simple privacy-preserving reformulation of a linear program whose equality constraint matrix is partitioned into groups of rows. Each group of matrix rows and its corresponding right hand side vector are owned by a distinct private entity that is unwilling to share ormake public its row group or right hand side vector. By multiplying each privately held constraint group by an appropriately generated and privately held random matrix, the original linear program is transformed into an equivalent one that does not reveal any of the privately held data or make it public. The solution vector of the transformed secure linear program is publicly generated and is available to all entities

    Absolute Value Equation Solution via Dual Complementarity

    Get PDF
    By utilizing a dual complementarity condition, we propose an iterative method for solving the NPhard absolute value equation (AVE): Ax?|x| = b, where A is an n�n square matrix. The algorithm makes no assumptions on the AVE other than solvability and consists of solving a succession of linear programs. The algorithm was tested on 500 consecutively generated random solvable instances of the AVE with n =10, 50, 100, 500 and 1,000. The algorithm solved 90.2% of the test problems to an accuracy of 10?8

    A Newton Method for Linear Programming

    Get PDF
    A fast Newton method is proposed for solving linear programs with a very large ( 106) number of constraints and a moderate ( 102) number of variables. Such linear programs occur in data mining and machine learning. The proposed method is based on the apparently overlooked fact that the dual of an asymptotic exterior penalty formulation of a linear program provides an exact least 2-norm solution to the dual of the linear program for nite values of the penalty parameter but not for the primal linear program. Solving the dual for a nite value of the penalty parameter yields an exact least 2-norm solution to the dual, but not a primal solution unless the parameter approaches zero. However, the exact least 2-norm solution to dual problem can be used to generate an accurate primal solution if m n and the primal solution is unique. Utilizing these facts, a fast globally convergent nitely terminating Newton method is proposed. A simple prototype of the method is given in eleven lines of MATLAB code. Encouraging computational results are presented such as the solution of a linear program with two million constraints that could not be solved by CPLEX 6.5 on the same machine

    Knowledge-Based Linear Programming

    Get PDF
    We introduce a class of linear programs with constraints in the form of implications. Such linear programs arise in support vector machine classi cation, where in addition to explicit datasets to be classi ed, prior knowledge such as expert's experience in the form of logical implications, are imposed on the classi er. The overall problem can be viewed either as a semi-in nite linear program or as a linear program with equilibrium constraints which, in either case, can be solved by an equivalent simple linear program under mild assumptions

    Primal-Dual Bilinear Programming Solution of the Absolute Value Equation

    Get PDF
    We propose a finitely terminating primal-dual bilinear programming algorithm for the solution of the NP-hard absolute value equation (AVE): Ax ? |x| = b, where A is an n � n square matrix. The algorithm, which makes no assumptions on AVE other than solvability, consists of a finite number of linear programs terminating at a solution of the AVE or at a stationary point of the bilinear program. The proposed algorithm was tested on 500 consecutively generated random instances of the AVE with n =10, 50, 100, 500 and 1,000. The algorithm solved 88.6% of the test problems to an accuracy of 1e ? 6

    The Ill-Posed Linear Complementarity Problem

    Get PDF
    A regularization of the linear complementarity problem (LCP) is proposed that leads to an exact solution, if one exists, otherwise a minimizer of a natural residual of the problem is obtained. The regularized LCP (RLCP) turns out to be linear program with equilibrium constrains (LPEC) that is always solvable. For the case when the underlying matrix M of the LCP is in the class Q0 (LCP solvable if feasible), the RLCP can be solved by quadratic program, which is convex if M is positive semi-definite. An explicitly exact penalty of the RLCP formulation is also given when M E Q0 and implicitly exact otherwise. Error bounds on the distance between an arbitrary point to the set of LCP residual minimizers follow from LCP error bound theory. Computational algorithms for solving the RLCP consist of solving a convex quadratic program when M E Q0, for which a potentially finitely terminating Frank-Wolfe method is proposed. For a completely general M, a parametric method is proposed wherein for each value of the parameter a Frank-Wolfe algorithm is carried out
    corecore